Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome

نویسندگان

  • Xinhua Fu
  • Jingjing Li
  • Yu Tian
  • Weipeng Quan
  • Shu Zhang
  • Qian Liu
  • Fan Liang
  • Xinlei Zhu
  • Liangsheng Zhang
  • Depeng Wang
  • Jiang Hu
چکیده

Background Fireflies are a family of insects within the beetle order Coleoptera, or winged beetles, and they are one of the most well-known and loved insect species because of their bioluminescence. However, the firefly is in danger of extinction because of the massive destruction of its living environment. In order to improve the understanding of fireflies and protect them effectively, we sequenced the whole genome of the terrestrial firefly Pyrocoelia pectoralis. Findings Here, we developed a highly reliable genome resource for the terrestrial firefly Pyrocoelia pectoralis (E. Oliv., 1883; Coleoptera: Lampyridae) using single molecule real time (SMRT) sequencing on the PacBio Sequel platform. In total, 57.8 Gb of long reads were generated and assembled into a 760.4-Mb genome, which is close to the estimated genome size and covered 98.7% complete and 0.7% partial insect Benchmarking Universal Single-Copy Orthologs. The k-mer analysis showed that this genome is highly heterozygous. However, our long-read assembly demonstrates continuousness with a contig N50 length of 3.04 Mb and the longest contig length of 13.69 Mb. Furthermore, 135 589 SSRs and 341 Mb of repeat sequences were detected. A total of 23 092 genes were predicted; 88.44% of genes were annotated with one or more related functions. Conclusions We assembled a high-quality firefly genome, which will not only provide insights into the conservation and biodiversity of fireflies, but also provide a wealth of information to study the mechanisms of their sexual communication, bio-luminescence, and evolution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Discovery of the female of Pyrocoelia prolongata in Taiwan (Coleoptera, Lampyridae)

The female of Pyrocoelia prolongata Jeng & Lai, a diurnal lampyrid species from Taiwan, is described for the first time. A single individual was found in a small, shady, dry streambed at the edge of a mixed forest at 2700 m elevation. The individual glowed in darkness and would move its abdomen up and down when disturbed and as a deterring behavior. A key to the females of the species of Pyroco...

متن کامل

Genome size of 14 species of fireflies (Insecta, Coleoptera, Lampyridae)

Eukaryotic genome size data are important both as the basis for comparative research into genome evolution and as estimators of the cost and difficulty of genome sequencing programs for non-model organisms. In this study, the genome size of 14 species of fireflies (Lampyridae) (two genera in Lampyrinae, three genera in Luciolinae, and one genus in subfamily incertae sedis) were estimated by pro...

متن کامل

Accurate Long-Read Alignment using Similarity Based Multiple Pattern Alignment and Prefix Tree Indexing

The ongoing research in sequencing technology has yielded in machines that are able to produce sequence data in the order of one billion base-pairs (bp) per machine day with an average read length of less than 100 bp per read (“short-reads”). In the past two years, many efficient algorithms have been developed for short-read alignment against a reference genome and for genome assembly, for an o...

متن کامل

Haplotype and Repeat Separation in Long Reads

Resolving the correct structure and succession of highly similar sequence stretches is one of the main open problems in genome assembly. For non haploid genomes this includes determining the sequences of the different haplotypes. For all but the smallest genomes it also involves separating different repeat instances. In this paper we discuss methods for resolving such problems in third generati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2017